Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-Word Terms

نویسندگان

  • Joachim Wermter
  • Udo Hahn
چکیده

We here propose a new method which sets apart domain-specific terminology from common non-specific noun phrases. It is based on the observation that terminological multi-word groups reveal a considerably lesser degree of distributional variation than non-specific noun phrases. We define a measure for the observable amount of paradigmatic modifiability of terms and, subsequently, test it on bigram, trigram and quadgram noun phrases extracted from a 104-million-word biomedical text corpus. Using a community-wide curated biomedical terminology system as an evaluation gold standard, we show that our algorithm significantly outperforms a variety of standard term identification measures. We also provide empirical evidence that our methodolgy is essentially domainand corpus-size-independent.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On multiword lexical units and their role in maritime dictionaries

Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...

متن کامل

Term Extraction and Mining of Term Relations from Unrestricted Texts in the Financial Domain

In this paper, we present an unsupervised hybrid textmining approach to automatic acquisition of domain relevant terms and their relations. We deploy the TFIDFbased term classification method to acquire domain relevant terms. Further, we apply two strategies in order to learn lexico-syntatic patterns which indicate paradigmatic and domain relevant syntagmatic relations between the extracted ter...

متن کامل

Identifying Terms by their Family and Friends

Multi-word terms are traditionally identi ed using statistical techniques or, more recently, using hybrid techniques combining statistics with shallow linguistic information. Approaches to word sense disambiguation and machine translation have taken advantage of contextual information in a more meaningful way, but terminology has rarely followed suit. We present an approach to term recognition ...

متن کامل

Collocation Extraction Based on Modifiability Statistics

We introduce a new, linguistically grounded measure of collocativity based on the property of limited modifiability and test it on German PP-verb combinations. We show that our measure not only significantly outperforms the standard lexical association measures typically employed for collocation extraction, but also yields a valuable by-product for the creation of collocation databases, viz. po...

متن کامل

Combining Word Patterns and Discourse Markers for Paradigmatic Relation Classification

Distinguishing between paradigmatic relations such as synonymy, antonymy and hypernymy is an important prerequisite in a range of NLP applications. In this paper, we explore discourse relations as an alternative set of features to lexico-syntactic patterns. We demonstrate that statistics over discourse relations, collected via explicit discourse markers as proxies, can be utilized as salient in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005